Methods and systems for determining a wake word

ABSTRACT

A user device (e.g., voice assistant device, voice enabled device, smart device, computing device, etc.) may receive/detect audio content (e.g., speech, etc.) that includes a wake word and/or words similar to a wake word. The user device may require a wake word, a portion of the wake word, or words similar to the wake word to be detected prior to interacting with a user. The user device may, based on characteristics of the audio content, determine if the audio content originates from an authorized user. The user device may decrease and/or increase scrutiny applied to wake word detection based on whether audio content originates from an authorized user.

CROSS REFERENCE TO RELATED PATENT APPLICATION

This application claims priority under 35 U.S.C. § 120 to, and is acontinuation of, U.S. patent application Ser. No. 17/187,182, filed Feb.26, 2021, which claims priority under 35 U.S.C. § 120 to, and is acontinuation of, U.S. patent application Ser. No. 16/189,937, filed Nov.13, 2018, now U.S. Pat. No. 10,971,160, the entire contents of each ofwhich are hereby incorporated herein by reference in their entirety forall purposes.

BACKGROUND

Homes and offices are becoming more connected with the proliferation ofuser devices, such as voice enabled smart devices. Users are able tointeract with such user devices through natural language input such asspeech. Typically, such user devices require a wake word in order tobegin interacting with a user. The use of speech to interact with suchuser devices presents many challenges. One challenge concerns ensuringthat the speech is intended to be a wake word. Another challenge isensuring that speech that includes a wake word is recognized asoriginating from an authorized user, rather than from an unauthorizeduser or audio source (e.g., a television).

SUMMARY

It is to be understood that both the following general description andthe following detailed description are exemplary and explanatory onlyand are not restrictive. Methods and systems for determining a wake wordare described.

A wake word (or wake phrase) detected by a user device (e.g., voiceassistant device, voice enabled device, smart device, computing device,etc.) may cause the user device to interact with a user (e.g., detectand/or process user commands and/or operational instructions, etc.). Theuser device may require that a wake word be received/detected prior toany interactions. The user device may detect speech and identify apotential wake word in the detected speech. The user device may assign aconfidence score indicative of the accuracy of the detection of the wakeword (e.g., did the user device detect the actual wake word, a differentword, background noise, etc.). The user device may compare theconfidence score to a threshold (e.g., a wake word detection threshold,etc.) to determine whether to accept the wake word or not. The userdevice may modify the threshold based on whether the user devicerecognizes the speech as originating from an authorized user. The userdevice may generate a voiceprint (e.g., one or more measurablecharacteristics of a human voice that uniquely identifies an individual,etc.) of the detected speech and compare the voiceprint to a storedvoiceprint known to originate from an authorized user. Upon determiningthat the voiceprints match, and therefore that the speech originatesfrom an authorized user, the user device may lower the threshold used todetermine if the wake word was detected. Upon determining that thevoiceprints do not match, and therefore that the speech did notoriginate from an authorized user, the user device may raise thethreshold used to determine if the wake word was detected. The userdevice may thus decrease or increase the scrutiny applied to wake worddetection based on whether the speech containing the wake wordoriginates from an authorized user.

This summary is not intended to identify critical or essential featuresof the disclosure, but merely to summarize certain features andvariations thereof. Other details and features will be described in thesections that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a system for determining a wake word;

FIG. 2 shows a process for determining a wake word;

FIG. 3 shows a flowchart of a method for determining a wake word;

FIG. 4 shows a flowchart of a method for determining a wake word;

FIG. 5 shows a flowchart of a method for determining a wake word; and

FIG. 6 shows a block diagram of a computing device for implementing wakeword determination.

DETAILED DESCRIPTION

As used in the specification and the appended claims, the singular forms“a,” “an,” and “the” include plural referents unless the context clearlydictates otherwise. Ranges may be expressed herein as from “about” oneparticular value, and/or to “about” another particular value. When sucha range is expressed, another configuration includes from the oneparticular value and/or to the other particular value. When values areexpressed as approximations, by use of the antecedent “about,” it willbe understood that the particular value forms another configuration. Itwill be further understood that the endpoints of each of the ranges aresignificant both in relation to the other endpoint, and independently ofthe other endpoint.

“Optional” or “optionally” means that the subsequently described eventor circumstance may or may not occur, and that the description includescases where said event or circumstance occurs and cases where it doesnot.

Throughout the description and claims of this specification, the word“comprise” and variations of the word, such as “comprising” and“comprises,” means “including but not limited to,” and is not intendedto exclude other components, integers or steps. “Exemplary” means “anexample of” and is not intended to convey an indication of a preferredor ideal configuration. “Such as” is not used in a restrictive sense,but for explanatory purposes.

It is understood that when combinations, subsets, interactions, groups,etc. of components are described that, while specific reference of eachvarious individual and collective combinations and permutations of thesemay not be explicitly described, each is specifically contemplated anddescribed herein. This applies to all parts of this applicationincluding, but not limited to, steps in described methods. Thus, ifthere are a variety of additional steps that may be performed it isunderstood that each of these additional steps may be performed with anyspecific configuration or combination of configurations of the describedmethods.

As will be appreciated by one skilled in the art, hardware, software, ora combination of software and hardware may be implemented. Furthermore,a computer program product on a computer-readable storage medium (e.g.,non-transitory) having processor-executable instructions (e.g., computersoftware) embodied in the storage medium. Any suitable computer-readablestorage medium may be utilized including hard disks, CD-ROMs, opticalstorage devices, magnetic storage devices, memresistors, Non-VolatileRandom Access Memory (NVRAM), flash memory, or a combination thereof.

Throughout this application reference is made to block diagrams andflowcharts. It will be understood that each block of the block diagramsand flowcharts, and combinations of blocks in the block diagrams andflowcharts, respectively, may be implemented by processor-executableinstructions. These processor-executable instructions may be loaded ontoa general purpose computer, special purpose computer, or otherprogrammable data processing apparatus to produce a machine, such thatthe processor-executable instructions which execute on the computer orother programmable data processing apparatus create a device forimplementing the functions specified in the flowchart block or blocks.

These processor-executable instructions may also be stored in acomputer-readable memory that may direct a computer or otherprogrammable data processing apparatus to function in a particularmanner, such that the processor-executable instructions stored in thecomputer-readable memory produce an article of manufacture includingprocessor-executable instructions for implementing the functionspecified in the flowchart block or blocks. The processor-executableinstructions may also be loaded onto a computer or other programmabledata processing apparatus to cause a series of operational steps to beperformed on the computer or other programmable apparatus to produce acomputer-implemented process such that the processor-executableinstructions that execute on the computer or other programmableapparatus provide steps for implementing the functions specified in theflowchart block or blocks.

Accordingly, blocks of the block diagrams and flowcharts supportcombinations of devices for performing the specified functions,combinations of steps for performing the specified functions and programinstruction means for performing the specified functions. It will alsobe understood that each block of the block diagrams and flowcharts, andcombinations of blocks in the block diagrams and flowcharts, may beimplemented by special purpose hardware-based computer systems thatperform the specified functions or steps, or combinations of specialpurpose hardware and computer instructions.

“Content items,” as the phrase is used herein, may also be referred toas “content,” “content data,” “content information,” “content asset,”“multimedia asset data file,” or simply “data” or “information”. Contentitems may be any information or data that may be licensed to one or moreindividuals (or other entities, such as business or group). Content maybe electronic representations of video, audio, text and/or graphics,which may be but is not limited to electronic representations of videos,movies, or other multimedia, which may be but is not limited to datafiles adhering to MPEG2, MPEG, MPEG4 UHD, HDR, 4k, Adobe® Flash® Video(.FLV) format or some other video file format whether such format ispresently known or developed in the future. The content items describedherein may be electronic representations of music, spoken words, orother audio, which may be but is not limited to data files adhering tothe MPEG-1 Audio Layer 3 (.MP3) format, Adobe®, CableLabs 1.0,1.1, 3.0,AVC, HEVC, H.264, Nielsen watermarks, V-chip data and Secondary AudioPrograms (SAP). Sound Document (.ASND) format or some other formatconfigured to store electronic audio whether such format is presentlyknown or developed in the future. In some cases, content may be datafiles adhering to the following formats: Portable Document Format(.PDF), Electronic Publication (.EPUB) format created by theInternational Digital Publishing Forum (IDPF), JPEG (.JPG) format,Portable Network Graphics (.PNG) format, dynamic ad insertion data(.csv), Adobe® Photoshop® (.PSD) format or some other format forelectronically storing text, graphics and/or other information whethersuch format is presently known or developed in the future. Content itemsmay be any combination of the above-described formats.

The term “wake word” may refer to and/or include a “wake word,” a “wakephrase,” a “processing word,” a “processing phrase,” a “confirmationword,” a “confirmation phrase,” and/or the like, and are consideredinterchangeable, related, and/or the same.

Phrases used herein, such as “accessing” content, “providing” content,“viewing” content, “listening” to content, “rendering” content,“playing” content, “consuming” content, and the like are consideredinterchangeable, related, and/or the same. In some cases, the particularterm utilized may be dependent on the context in which it is used.Accessing video may also be referred to as viewing or playing the video.Accessing audio may also be referred to as listening to or playing theaudio.

This detailed description may refer to a given entity performing someaction. It should be understood that this language may in some casesmean that a system (e.g., a computer) owned and/or controlled by thegiven entity is actually performing the action.

A wake word (or wake phrase) detected by a user device (e.g., voiceassistant device, voice enabled device, smart device, computing device,etc.) may cause the user device to interact with a user (e.g., detectand/or process user commands and/or operational instructions, etc.). Awake word (or wake phrase) detected by a user device (e.g., voiceassistant device, voice enabled device, smart device, computing device,etc.) may cause the user device to process content/information, confirmcontent/information, and/or the like.

The user device may require that a wake word be received/detected priorto any interactions. The user device may determine whether a wake wordis received/detected (e.g., whether to accept a potential wake word,etc.) or not based on a threshold (e.g., a wake word detectionthreshold, etc.). The user device may detect audio content (e.g.,speech, etc.) and detect/identify a potential wake word from the audiocontent. The audio content may include and/or be associated with thewake word (or wake phrase), and/or include one or more words that aresimilar to the wake word (or wake phrase). A wake word (or wake phrase)may be, “hey device,” and audio content may include one or more wordsthat are similar to the wake word (or wake phrase), such as, “hey,devices are usually sold here.” The user device may assign a confidencescore indicative of the accuracy of the detection of the wake word(e.g., did the user device detect an actual wake word/phrase, adifferent/similar word/phrase, background noise, etc.). The user devicemay compare the confidence score to the threshold (e.g., a wake worddetection threshold, etc.) to determine whether to accept one or morewords included with audio content as a wake word (or wake phrase) ornot. The user device may modify the threshold based on whether the userdevice recognizes the audio content as originating from an authorizeduser. An authorized user may include a person registered to operate theuser device (e.g., based on a user profile, etc.), a person that haspermission to use the user device, a user associated with the userdevice, and/or the like. If audio content originates from an authorizeduser, the threshold may be low and/or lowered. If audio content does notoriginate from an authorized user, such as audio content that originatesfrom an unauthorized user (e.g., a non-authorized user, a user notassociated with an authorized user, etc.), a device (e.g., a television,a radio, a computing device, etc.), or the like, the threshold may behigh and/or raised.

To determine whether audio content (e.g., speech, etc.) originates froman authorized user, the user device may generate, based on the audiocontent, a voiceprint (e.g., one or more measurable characteristics of ahuman voice that uniquely identifies an individual, etc.). To generatethe voiceprint, the user device may determine audio characteristics ofthe audio content. Audio characteristics may be and/or include afrequency, a duration, a decibel level, an amplitude, a tone, aninflection, an audio rate, an audio volume, and/or any suchcharacteristic associated with the audio content. The voiceprint may bea collection and/or combination of the audio characteristics. The userdevice may compare the voiceprint to a stored voiceprint known tooriginate from an authorized user. If the voiceprint matches and/orcorresponds to a stored voiceprint, the user device may determine thatthe audio content associated with the voiceprint originates from anauthorized user. The user device, based on determining that audiocontent originates from an authorized user, may lower the threshold(e.g., the wake word detection threshold, etc.) used to determinewhether the wake word was detected. If the voiceprint does not matchand/or correspond to a stored voiceprint, the user device may determinethat the audio content associated with the voiceprint did not originatefrom an authorized user (e.g., audio content that originates from anunknown, non-authorized, and/or unauthorized user, etc.). The userdevice, based on determining that the audio content associated with thevoiceprint did not originate from an authorized user, may raise thethreshold used to determine whether the wake word was detected.

A low threshold (e.g., wake word detection threshold, etc.) may causethe user device to interact with a user (e.g., detect and/or processuser commands and/or operational instructions, etc.) if the audiocontent includes either a wake word (or wake phrase), or one or morewords that are similar to the wake word (or wake phrase). The userdevice may, based on determining that audio content originates from anauthorized user, may interact with a user if the audio content includesa wake word (or wake phrase), such as, “hey device.” When audio contentis associated with an authorized user, the user device may interact withthe authorized user if the audio content includes the wake word (or wakephrase) “hey device,” or includes one or more words that are similar tothe wake word (or wake phrase) such as, “hey, devices are usually soldhere,” “hey, Diana,” “having divided,” and/or any other one or morewords that may be ambiguously associated with the wake word (or wakephrase).

A high threshold (e.g., wake word detection threshold, etc.) may causethe user device to interact with a user (e.g., detect and/or processuser commands and/or operational instructions, etc.) only if the audiocontent includes a wake word (or wake phrase). The user device may,based on determining that audio content is not associated with anauthorized user, such as audio content associated with an unauthorizeduser (e.g., a non-authorized user, a user not associated with anauthorized user, etc.), a device (e.g., a television, a radio, acomputing device, etc.), or the like, may require that the audio contentinclude a wake word (or wake phrase) prior to interacting with a user.The user device may, based on determining that audio content is notassociated with an authorized user, may require that audio contentinclude a wake word (or wake phrase), such as “hey device,” prior tointeracting with a user. The user device may not interact with a user,and/or remain in an unawakened state (e.g., standby, hibernate, etc.),based on determining that audio content is not associated with anauthorized user. A high threshold may cause the user device to notinteract with a user, and/or remain in an unawakened state (e.g.,standby, hibernate, etc.), even if the audio content includes a wakeword (or wake phrase).

The user device, by raising and lowering the threshold (e.g., the wakeword detection threshold, etc.) based on whether audio contentoriginates from an authorized user, may decrease or increase scrutinyapplied to wake word detection.

FIG. 1 shows a system 100 for determining a wake word. A wake word (orwake phrase) may cause a user device (e.g., user device 101, etc.) tointeract with a user (e.g., detect and/or process user commands and/oroperational instructions, etc.). A wake word (or wake phrase) may causea user device to process content/information, confirmcontent/information, and/or the like.

The system 100 may comprise a user device 101 (e.g., a voice assistantdevice, a voice enabled device, a smart device, a computing device,etc.). The user device 101 may be in communication with a network suchas a network 105. The network 105 may be a network such as the Internet,a wide area network, a local area network, a cellular network, asatellite network, and the like. Various forms of communications mayoccur via the network 105. The network 105 may comprise wired andwireless telecommunication channels, and wired and wirelesscommunication techniques.

The user device 101 may be associated with a device identifier 108. Thedevice identifier 108 may be any identifier, token, character, string,or the like, for differentiating one user device (e.g., the user device101, etc.) from another user device. The device identifier 108 mayidentify user device 101 as belonging to a particular class of userdevices. The device identifier 108 may include information relating tothe user device 101 such as a manufacturer, a model or type of device, aservice provider associated with the user device 101, a state of theuser device 101, a locator, and/or a label or classifier. Otherinformation may be represented by the device identifier 108.

The device identifier 108 may have an address element 113 and a serviceelement 112. The address element 113 may have or provide an internetprotocol address, a network address, a media access control (MAC)address, an Internet address, or the like. The address element 113 maybe relied upon to establish a communication session between the userdevice 101, a computing device 106, or other devices and/or networks.The address element 113 may be used as an identifier or locator of theuser device 101. The address element 113 may be persistent for aparticular network (e.g., network 105, etc.).

The service element 112 may identify a service provider associated withthe user device 101 and/or with the class of the user device 101. Theclass of the user device 101 may be related to a type of device,capability of device, type of service being provided, and/or a level ofservice (e.g., business class, service tier, service package, etc.). Theservice element 112 may have information relating to or provided by acommunication service provider (e.g., Internet service provider) that isproviding or enabling data flow such as communication services to theuser device 101. The service element 112 may have information relatingto a preferred service provider for one or more particular servicesrelating to the user device 101. The address element 113 may be used toidentify or retrieve data from the service element 112, or vice versa.One or more of the address element 113 and the service element 112 maybe stored remotely from the user device 101 and retrieved by one or moredevices such as the user device 101, the computing device 106, or anyother device. Other information may be represented by the serviceelement 112.

The user device 101 may have an audio content detection module 102 fordetecting audio content. The audio content detection module 102 maydetect and/or receive audio content originating from a user speaking inproximity to the user device 101 and/or the like. The audio contentdetection module 102 may include one or more microphones, or the likethat, detect/receive the audio content. The audio content may haveand/or be associated with a wake word (or wake phrase). The audiocontent may be content (e.g., speech, etc.) that originates from and/oris caused by a user (e.g., an authorized user, an unauthorized user,etc.), a device (e.g., a television, a radio, a computing device, etc.).The audio content may include and/or be associated with the wake word(or wake phrase), and/or include one or more words that are similar tothe wake word (or wake phrase). A wake word (or wake phrase) may be,“hey device.” The audio content may include the wake word or may includeone or more words that are similar to the wake word (or wake phrase),such as, “hey, devices are usually sold here.” The audio contentdetection module 102 may provide the audio content (e.g., a signalindicative of the audio content) to an audio analysis module 103. Theuser device 101 may use the audio analysis module 103 to determine thewake word (or wake phrase) and/or one or more words that are similar tothe wake word (or wake phrase).

The audio analysis module 103 may determine, based on the audio content,a wake word (or wake phrase) and/or one or more words that are similarto the wake word (or wake phrase). The audio analysis module 103 maydetermine the wake word (or wake phrase) and/or one or more words thatare similar to the wake word (or wake phrase) by performingspeech-to-text operations that translate audio content (e.g., speech,etc.) to text, other characters, or commands. The audio analysis module103 may apply one or more voice recognition algorithms to the audiocontent (e.g., speech, etc.) to extract a word and/or words. The audioanalysis module 103 may convert the extracted word or words to text andcompare the text to a stored word and/or stored words (e.g., stored inthe storage module 104, etc.), such as a wake word (or wake phrase). Awake word (or wake phrase) and/or one or more words that are similar tothe wake word (or wake phrase), such as wake word synonyms that share aphonetic relationship, and the like, may be stored (e.g., stored in thestorage module 104, etc.), such as during a device (user device)registration process, when a user profile associated with a user deviceis generated, and/or any other suitable/related method.

The audio analysis module 103 may determine whether a word and/or wordsextracted/determined from the audio content match a stored wake word (orwake phrase), or are related to a stored wake word (or wake phrase). Theaudio analysis module 103 may determine whether audio content includes awake word (or wake phrase) (e.g., did the user device 101 detect anactual wake word (or wake phrase), such as “hey device,” etc.), or oneor more words related to the wake word (or wake phrase) (e.g., did theuser device 101 detect a different/similar word/phrase related to thewake word (or wake phrase), such as, “hey, devices are usually soldhere,” “hey, Diana,” “having divided,” etc.).

The audio analysis module 103 may assign a confidence score indicativeof the accuracy of whether a wake word (or wake phrase) is determinedfrom audio content. A confidence score may be based on a scale, such asfrom a value of one (1) to ten (10), where scale values correspond to anaccuracy of wake word detection. A confidence score may be based on anyscale and/or value. The audio analysis module 103 may determine that theaudio content includes one or more words, such as “hey device,” thatmatch a stored wake word (or wake phrase) “hey device,” associated withthe user device 101. The audio analysis module 103 may assign aconfidence score of ten (10) which indicates that the wake word (or wakephrase) determined from the audio content matches (e.g., approximately100 percent accuracy, etc.) a stored wake word (or wake phrase).

The audio analysis module 103 may determine that audio content includesone or more words that are similar to a stored wake word (or wakephrase) “hey device,” such as, “hey, devices are usually sold here,”“hey, Diana,” “having divided,” and/or any other one or more words thatmay be ambiguously associated with the stored wake word (or wakephrase). The audio analysis module 103 may assign a confidence score ofeight (8) to one or more words that are similar to the stored wake word(or wake phrase) “hey device,” such as, “hey, devices are usually soldhere,” which indicates that the one or more words determined from theaudio content are a close (e.g., similar, a partial match, less thanpercent accuracy, etc.) to the stored wake word (or wake phrase) match.The audio analysis module 103 may assign a confidence score of two (2)to one or more words that are similar to the stored wake word (or wakephrase) “hey device,” such as, “hey, do you want tacos tonight,” whichindicates that the one or more words determined from the audio contentare weakly related (e.g., somewhat similar, a partial match, less thanpercent accuracy, etc.) to the stored wake word. The audio analysismodule 103 may assign any confidence score indicative of the accuracy ofa possible wake word (or wake phrase) determined from audio content.

The audio analysis module 103 may compare a confidence score to athreshold (e.g., a wake word detection threshold, etc.) to determinewhether to accept one or more words included with audio content as awake word (or wake phrase) or not. The audio analysis module 103 maydetermine to accept the one or more words included with the audiocontent as the wake word (or wake phrase) when the confidence score isequal to and/or satisfies the threshold, and may determine not to acceptthe one or more words as the wake word (or wake phrase) when theconfidence score does not satisfy the threshold. The threshold may be avalue, such as a threshold value of six (6). Audio content that includesone or more words that are similar to a stored wake word (or wakephrase) “hey device,” such as “hey, devices are usually sold here,” maybe assigned a confidence score of eight (8). The audio content includingthe one or more words that are similar to the stored wake word (or wakephrase) may satisfy the threshold because the assigned confidence scoreof eight (8) is greater than the threshold value of six (6). Thethreshold may be any value. The threshold may be satisfied by aconfidence score that is equal to, or greater than, the threshold value.A confidence score that is less than the threshold value may not satisfythe threshold. The audio analysis module 103 may modify the thresholdbased on whether the user device 101 (e.g., the audio analysis module103, etc.) determines that the audio content originates from anauthorized user.

When audio content originates from an authorized user, the threshold(e.g., a wake word detection threshold, etc.) may be low and/or lowered.When audio content does not originate from an authorized user, such asaudio content that originates from an unauthorized user (e.g., anon-authorized user, a user not associated with an authorized user,etc.), a device (e.g., a television, a radio, a computing device, etc.),or the like, the threshold may be high and/or raised.

The user device 101 may determine whether audio content is associatedwith an authorized user or an unauthorized user. An authorized user mayinclude a person registered to operate the user device 101 (e.g., basedon a user profile, etc.), a person that has permission to use the userdevice 101, a user associated with the user device 101, and/or the like.An unauthorized user may include a person that is not registered tooperate the user device 101 (e.g., based on a user profile, etc.), aperson that does not have permission to use the user device 101, a userthat is associated with the user device 101, a person that is notassociated with an authorized user, and/or the like, such as a guest ina home where the user device 101 may be located, a device (e.g., atelevision, a radio, a computing device, etc.) generating audio content,and/or the like.

To determine whether audio content (e.g., speech, etc.) originates froman authorized user, the audio analysis module 103 may generate avoiceprint (e.g., one or more measurable characteristics of a humanvoice that uniquely identifies an individual, etc.) from the audiocontent. The audio analysis module 103 may determine audiocharacteristics of the audio content. Audio characteristics may beand/or include a frequency, a duration, a decibel level, an amplitude, atone, an inflection, an audio rate, an audio volume, and/or any suchcharacteristic associated with the audio content. The voiceprint may bea collection and/or combination of the audio characteristics. The audioanalysis module 103 may determine and store (e.g., via storage module104) audio characteristics. The audio analysis module 103 may determineand store audio characteristics when the user device 101 is configuredfor a “learn” or “discovery” mode, during an initial setup and/orregistration of the user device 101, based on repeated use of the userdevice 101, combinations thereof, and the like. The user device 101 mayassociate a voiceprint with a particular user and/or store/associate thevoiceprint with a profile (e.g., user profile).

The audio analysis module 103 may compare a voiceprint determined fromaudio content to a stored voiceprint known to originate from anauthorized user. When a voiceprint matches and/or corresponds to astored voiceprint, the audio analysis module 103 may determine thataudio content associated with the voiceprint originates from anauthorized user. The user device 101 may determine that a voiceprintdoes not correspond (match) to a stored voiceprint. A voiceprint thatdoes not correspond (match) to a stored voiceprint may be associatedwith an unauthorized user.

The audio analysis module 103, based on determining that the audiocontent originates from an authorized user, may lower the threshold(e.g., the wake word detection threshold, etc.) used to determinewhether the wake word (or wake phrase) (e.g. stored wake word/phrase,etc.) was detected. The audio analysis module 103, based on determiningthat the audio content originates from an unauthorized user, may raisethe threshold used to determine whether the wake word (or wake phrase)(e.g. stored wake word/phrase, etc.) was detected.

A low threshold (e.g., wake word detection threshold, etc.) may causethe user device 101 to interact with a user (e.g., detect and/or processuser commands and/or operational instructions, etc.) if the audiocontent includes either a wake word (or wake phrase) (e.g., matches astored wake word/phrase, etc.), or one or more words that are similar tothe wake word (or wake phrase) (e.g., similar to a stored wakeword/phrase, etc.). The user device 101 may, based on the audio analysismodule 103 determining that audio content is associated with anauthorized user, interact with a user (e.g., detect and/or process usercommands and/or operational instructions, etc.) if the audio contentincludes a wake word (or wake phrase), such as, “hey device.” The userdevice 101 may, based on the audio analysis module 103 determining thataudio content is associated with an authorized user, interact with theauthorized user (e.g., detect and/or process user commands and/oroperational instructions, etc.) if the audio content includes one ormore words that are similar to the wake word (or wake phrase) “heydevice,” such as, “hey, devices are usually sold here,” “hey, Diana,”“having divided,” and/or any other one or more words that may beambiguously associated with the wake word (or wake phrase).

A high threshold (e.g., wake word detection threshold, etc.) may causethe user device 101 to interact with a user (e.g., detect and/or processuser commands and/or operational instructions, etc.) only if audiocontent includes a wake word (or wake phrase). The user device 101 may,based on the audio analysis module 103 determining that audio content isnot associated with an authorized user, such as an audio contentassociated with an unauthorized user (e.g., a non-authorized user, auser not associated with an authorized user, etc.), a device (e.g., atelevision, a radio, a computing device, etc.), or the like, may requirethat the audio content include a wake word (or wake phrase) prior tointeracting with a user. The user device 101 may, based on the audioanalysis module 103 determining that the audio content is not associatedwith an authorized user, require that the audio content include a wakeword (or wake phrase), such as “hey device,” prior to interacting with auser. The user device 101 may not interact with a user, and/or remain inan unawakened state (e.g., standby, hibernate, etc.), based on the audioanalysis module 103 determining that audio content is not associatedwith an authorized user. A high threshold may cause the user device 101to not interact with a user, and/or remain in an unawakened state (e.g.,standby, hibernate, etc.), even if audio content includes a wake word(or wake phrase). The user device 101, by raising and lowering thethreshold (e.g., the wake word detection threshold, etc.) based onwhether audio content originates from an authorized user, may decreaseor increase scrutiny applied to wake word detection.

The user device 101 may have a communication module 105 for providing aninterface to a user to interact with the user device 102 and/or thecomputing device 106. The communication module 105 may be any interfacefor presenting and/or receiving information to/from the user, such asuser feedback. An interface may be communication interface such as a webbrowser (e.g., Internet Explorer °, Mozilla Firefox, Google Chrome °,Safari °, or the like). Other software, hardware, and/or interfaces maybe used to provide communication between the user and one or more of theuser device 101 and the computing device 106. The communication module105 may request or query various files from a local source and/or aremote source. The communication module 105 may transmit data, such asaudio content, voice characteristics, voiceprint information, and thelike to a local or remote device such as the computing device 106.

The computing device 106 may be a server for communicating with the userdevice 101. The computing device 106 may communicate with the userdevice 101 for providing data and/or services. The computing device 106may provide services, such as wake word determination services, network(e.g., Internet) connectivity, network printing, media management (e.g.,media server), content services, streaming services, broadband services,or other network-related services. The computing device 106 may allowthe user device 101 to interact with remote resources such as data,devices, and files. The computing device 106 may be configured as (ordisposed at) a central location (e.g., a headend, or processingfacility), which may receive content (e.g., audio content, wake worddetermination content, voiceprint information, user profiles, data,input programming, etc.) from multiple sources. The computing device 106may combine the content from the multiple sources and may distribute thecontent to user (e.g., subscriber) locations via a distribution system(e.g., the network 105, etc.).

The computing device 106 may manage the communication between the userdevice 101 and a database 114 for sending and receiving datatherebetween. The database 114 may store a plurality of files (e.g.,audio content, wake word determination content, voiceprint information,user profiles, etc.), user identifiers or records, or other information.The user device 101 may request and/or retrieve a file from the database114. The database 114 may store information relating to the user device101 such as user information (e.g., authorized user information,unauthorized user information, etc.), wake word information, the addresselement 110 and/or the service element 112. The computing device 106 mayobtain the device identifier 108 from the user device 101 and retrieveinformation from the database 114 such as user information (e.g.,authorized user information, unauthorized user information, etc.), wakeword information, the address element 110, and/or the service element112. The computing device 106 may obtain the address element 110 fromthe user device 101 and may retrieve the service element 112 from thedatabase 114, or vice versa. Any information may be stored in andretrieved from the database 114. The database 114 may be disposedremotely from the computing device 106 and accessed via direct orindirect connection. The database 114 may be integrated with thecomputing system 106 or some other device or system.

The user device 101 may communicate with the computing system 106 todetermine if audio content includes a wake word (or wake phrase) and/orwhether the audio content is associated with an authorized user or anunauthorized user (e.g., a non-authorized user, a user not associatedwith an authorized user, etc.). The user device 101 may communicate withthe computing system 106 while not interacting with a user (e.g., whilein standby, while hibernating, etc.). The audio content detection module102 may detect and/or receive audio content based on a user speaking inproximity to the user device 101 and/or the like. The user device 101may use the communication module 105 to communicate with the computingsystem 106 to determine if audio content includes a wake word (or wakephrase) and/or whether the audio content is associated with anauthorized user or an unauthorized user. The user device 101 may providethe audio content to the computing device 106. The communication module105 may comprise a transceiver configured for communicating informationusing any suitable wireless protocol, such as Wi-Fi (IEEE 802.11),BLUETOOTH®, cellular, satellite, infrared, or any other suitablewireless standard. The communication module 105 may communicate with thecomputing device 106 via a short-range communication technique (e.g.,BLUETOOTH®, near-field communication, infrared, and the like). Thecommunication module 105 may communicate with the computing device 106via a long-range communication technique (e.g., Internet, cellular,satellite, and the like).

The computing device 106 may include an audio analysis module 123. Thecomputing device 106 may use the audio analysis module 123 to determinea wake word (or wake phrase) from the audio content received from theuser device 101. The audio analysis module 123 may determine one or morewords that are similar to the wake word (or wake phrase). The audioanalysis module 123 may determine whether a word and/or wordsextracted/determined from audio content match a stored wake word (orwake phrase), or are related to a stored wake word (or wake phrase). Theaudio analysis module 123 may assign a confidence score indicative ofthe accuracy of a possible wake word (or wake phrase) determined fromaudio content. A confidence score may be based on a scale, such as froma value of one (1) to ten (10), where scale values correspond to anaccuracy of wake word detection. A confidence score may be based on anyscale and/or value. The audio analysis module 123 may determine thataudio content includes a one or more words, such as “hey device,” thatmatch a stored wake word (or wake phrase) “hey device,” associated withthe user device 101. The audio analysis module 123 may assign aconfidence score of ten (10) which indicates that the wake word (or wakephrase) determined from the audio content matches (e.g., approximately100 percent accuracy, etc.) a stored wake word. The audio analysismodule 123 may determine that audio content includes one or more wordsthat are similar to a wake word (or wake phrase) “hey device,” such as,“hey, devices are usually sold here,” “hey, Diana,” “having divided,”and/or any other one or more words that may be ambiguously associatedwith a stored wake word (or wake phrase), such as “hey device,”associated with the user device 101. The audio analysis module 123 mayassign a confidence score of eight (8) to one or more words that aresimilar to the stored wake word (or wake phrase) “hey device,” such as,“hey, devices are usually sold here,” which indicates that the one ormore words determined from the audio content are close to (e.g.,similar, a partial match, less than a percent of accuracy, etc.) thestored wake word. The audio analysis module 123 may assign a confidencescore of two (2) to one or more words that are similar to the storedwake word (or wake phrase) “hey device,” such as, “hey, do you wanttacos tonight,” which indicates that the one or more words determinedfrom the audio content are weakly related (e.g., somewhat similar, apartial match, less than percent accuracy, etc.) to the stored wakeword. The audio analysis module 123 may assign any confidence scoreindicative of the accuracy of a possible wake word (or wake phrase)determined from audio content.

The audio analysis module 123 may compare a confidence score to athreshold (e.g., a wake word detection threshold, etc.) to determinewhether to accept one or more words included with audio content as awake word or not. The audio analysis module 123 may determine to acceptthe one or more words included with the audio content as the wake word(or wake phrase) when the confidence score is equal to and/or satisfiesthe threshold, and may determine not to accept the one or more words asthe wake word (or wake phrase) when the confidence score does notsatisfy the threshold. The threshold may be satisfied by a confidencescore that is equal to, or greater than, the threshold value. Aconfidence score that is less than the threshold value may not satisfythe threshold. The audio analysis module 103 may modify the thresholdbased on whether the audio content originates from an authorized user.

The audio analysis module 123 may modify the threshold based on whetherthe audio analysis module 123 determines that the audio contentoriginates from an authorized user. When audio content originates froman authorized user, the threshold may be low and/or lowered. When audiocontent does not originate from an authorized user, such as audiocontent that originates from an unauthorized user (e.g., anon-authorized user, a user not associated with an authorized user,etc.), a device (e.g., a television, a radio, a computing device, etc.),or the like, the threshold may be high and/or raised.

The computing device 106 may determine whether audio content isassociated with an authorized user or an unauthorized user. Anauthorized user may include a person registered to operate the userdevice 101 (e.g., based on a user profile, etc.), a person that haspermission to use the user device 101, a user associated with the userdevice 101, and/or the like. An unauthorized user may include a personthat is not registered to operate the user device 101 (e.g., based on auser profile, etc.), a person that does not have permission to use theuser device 101, a user that is associated with the user device 101, aperson not associated with an authorized user, and/or the like, such asa guest in a home where the user device 101 may be located, a device(e.g., a television, a radio, a computing device, etc.) generating audiocontent, and/or the like.

To determine whether audio content (e.g., speech, etc.) originates froman authorized user, the audio analysis module 123 may generate, based onthe audio content (e.g., speech, etc.) a voiceprint (e.g., one or moremeasurable characteristics of a human voice that uniquely identifies anindividual, etc.). The audio analysis module 123 may determine audiocharacteristics of the audio content. Audio characteristics may beand/or include a frequency, a duration, a decibel level, an amplitude, atone, an inflection, an audio rate, an audio volume, and/or any suchcharacteristic associated with the audio content. The voiceprint may bea collection and/or combination of the audio characteristics. The audioanalysis module 123 may determine and store (e.g., via the database 114,etc.) audio characteristics. The audio analysis module 123 may receiveaudio content from the user device 101 and determine and store audiocharacteristics when the user device 101 is configured for a “learn” or“discovery” mode, during an initial setup and/or registration of theuser device 101, based to repeated use of the user device 101,combinations thereof, and the like. The computing device 106 mayassociate a voiceprint with a particular user and/or store/associate thevoiceprint with a profile (e.g., user profile).

The audio analysis module 123 may compare a voiceprint determined fromaudio content to a stored voiceprint known to originate from anauthorized user. When a voiceprint matches and/or corresponds to astored voiceprint, the audio analysis module 123 may determine thataudio content associated with the voiceprint originates from anauthorized user. The computing device 106 may determine that avoiceprint does not correspond (match) to a stored voiceprint. Avoiceprint that does not correspond (match) to a stored voiceprint maybe associated with an unauthorized user.

The computing device 106, based on determining that audio content isassociated with an authorized user or an unauthorized user, may lower orraise a threshold (e.g., the wake word detection threshold, etc.) usedto determine whether the wake word (or wake phrase) (e.g. stored wakeword/phrase, etc.) was detected. A low threshold may cause the computingdevice 106 to instruct user device 101 to interact with a user (e.g.,detect and/or process user commands and/or operational instructions,etc.) if the audio content includes either a wake word (or wake phrase)(e.g., matches a stored wake word/phrase, etc.), or one or more wordsthat are similar to the wake word (or wake phrase) (e.g., similar to astored wake word/phrase, etc.). A low threshold may cause the computingdevice 106 to instruct user device 101 to interact with a user whenaudio content is associated with a confidence score that is equal to, orsatisfies, a threshold (e.g., wake word detection threshold, etc.) valuethat is low on a threshold value scale, such as a value of two (2) on ascale from one (1) to ten (10).

A high threshold (e.g., wake word detection threshold, etc.) may causethe computing device 106 to instruct user device 101 to interact with auser (e.g., detect and/or process user commands and/or operationalinstructions, etc.) only if audio content includes a wake word (or wakephrase). A high threshold may cause the computing device 106 to instructuser device 101 to interact with a user when audio content is associatedwith a confidence score that is equal to, or satisfies, a threshold(e.g., wake word detection threshold, etc.) value that is high on athreshold value scale, such as a value of nine (9) on a scale from one(1) to ten (10).

The computing device 106 may instruct user device 101, based ondetermining that audio content originates from an unauthorized/unknownuser, to require that audio content include a wake word (or wakephrase), such as a stored wake word (or wake phrase), prior tointeracting with the unauthorized user. The user device 101, based oninstructions from the computing device 106, may not interact with auser, and/or remain in an unawakened state (e.g., standby, hibernate,etc.) when audio content is not associated with an authorized user. Ahigh threshold may cause the computing device 106 to instruct the userdevice 101 to not interact with a user, and/or remain in an unawakenedstate (e.g., standby, hibernate, etc.), even if audio content includes awake word (or wake phrase). The computing device 106, by raising andlowering the threshold (e.g., the wake word detection threshold, etc.)based on whether audio content originates from an authorized user, maydecrease or increase scrutiny applied to wake word detection.

The computing device 106 may provide an indication to the user device101 that audio content is associated with an authorized user and/or anunauthorized user. The computing device 106 may provide an indication tothe user device 101 that the audio content includes (or does notinclude) a wake word (or wake phrase) and/or one or more words that aresimilar to the wake word (or wake phrase). An indication provided theuser device 101 that audio content originates from an authorized user oran unauthorized user may also include a confidence score associated withone or more words determined from the audio content that indicates anaccuracy of the detection of a wake word associated with the audiocontent.

The computing device 106 may provide an indication to the user device101 that audio content is associated with an authorized user and/or anunauthorized user, and/or whether the audio content includes (or doesnot include) a wake word (or wake phrase) and/or one or more words thatare similar to the wake word (or wake phrase), via the network 105. Thecomputing device 106 may provide an indication to the user device 101that audio content is associated with an authorized user and/or anunauthorized user, and/or whether the audio content includes (or doesnot include) a wake word (or wake phrase) and/or one or more words thatare similar to the wake word (or wake phrase) via a short-rangecommunication technique (e.g., BLUETOOTH®, near-field communication,infrared, and the like). The computing device 106 may provide anindication to the user device 101 that audio content is associated withan authorized user and/or an unauthorized user, and/or whether the audiocontent includes (or does not include) a wake word (or wake phrase)and/or one or more words that are similar to the wake word (or wakephrase), via a long-range communication technique (e.g., Internet,cellular, satellite, and the like).

The user device 101, based on an indication received from the computingdevice 106 that audio content is associated with an authorized user oran unauthorized user may lower or raise the threshold (e.g., the wakeword detection threshold, etc.) used to determine whether the wake word(or wake phrase) (e.g. stored wake word/phrase, etc.) was detected. Alow threshold may cause the user device 101 to interact with a user(e.g., detect and/or process user commands and/or operationalinstructions, etc.) if the audio content includes either a wake word (orwake phrase) (e.g., matches a stored wake word/phrase, etc.), or one ormore words that are similar to the wake word (or wake phrase) (e.g.,similar to a stored wake word/phrase, etc.). A low threshold may causethe user device 101 to interact with a user when audio content isassociated with a confidence score that is equal to, or satisfies, athreshold (e.g., wake word detection threshold, etc.) value that is lowon a threshold value scale, such as a value of two (2) on a scale fromone (1) to ten (10).

A high threshold (e.g., wake word detection threshold, etc.) may causethe user device 101 to interact with a user (e.g., detect and/or processuser commands and/or operational instructions, etc.) only if audiocontent includes a wake word (or wake phrase). A high threshold maycause the user device 101 to interact with a user when audio content isassociated with a confidence score that is equal to, or satisfies, athreshold (e.g., wake word detection threshold, etc.) value that is highon a threshold value scale, such as a value of nine (9) on a scale fromone (1) to ten (10).

The user device 101 may, based on an indication from the computingdevice 106 that audio content originates from an unauthorized/unknownuser may require that audio content include a wake word (or wakephrase), such as a stored wake word (or wake phrase), prior tointeracting with the unauthorized user. The user device 101 may notinteract with a user, and/or remain in an unawakened state (e.g.,standby, hibernate, etc.), based on the audio analysis module 103determining that audio content is not associated with an authorizeduser. A high threshold may cause the user device 101 to not interactwith a user, and/or remain in an unawakened state (e.g., standby,hibernate, etc.), even if audio content includes a wake word (or wakephrase). The user device 101, by raising and lowering the threshold(e.g., the wake word detection threshold, etc.) based on whether audiocontent originates from an authorized user, may decrease or increasescrutiny applied to wake word detection.

FIG. 2 shows a process 200 for determining a wake word. At 201, a userdevice (e.g., voice assistant device, voice enabled device, smartdevice, computing device, the user device 101, etc.) may detect and/orreceive audio content (e.g., speech, etc.). The user device may detectand/or receive the audio content based on a user (or users) speaking inproximity to the user device, a device (e.g., a television, a radio, acomputing device, etc.) in proximity to the user device, and/or anyother audio content source in proximity to the user device. The userdevice may include one or more microphones, or the like, thatdetect/receive the audio content. The audio content may/may not includea wake word (or wake phrase) and/or may include one or more words thatare similar to the wake word (or wake phrase). The audio content mayinclude a wake word (or wake phrase) such as, “hey device,” and/orinclude one or more words that are similar to the wake word (or wakephrase), such as, “Hey, devices are usually sold here.”

The user device may determine whether the audio content (e.g., speech,etc.) includes the wake word (or wake phrase) and/or includes one ormore words that are similar to the wake word (or wake phrase) byperforming speech-to-text operations and/or applying one or more voicerecognition algorithms to the audio content to extract text, such as aword and/or words. The user device may compare the text (e.g., theextracted word and/or words, etc.) to a stored text (e.g., a stored wordand/or stored words, etc.), such as a wake word (or wake phrase). Theuser device may access a storage that includes the wake word (or wakephrase) and/or one or more words that are similar to the wake word (orwake phrase), synonyms of the wake word (or wake phrase), words thatshare a phonetic relationship with the wake word (or wake phrase), andthe like to determine whether the wake word is detected from the audiocontent.

At 202, the user device may assign a confidence score indicative of theaccuracy of the detection of the wake word (e.g., did the user devicedetect the actual wake word/phrase, a different/similar word/phrase,background noise, etc.). A confidence score may be based on a scale,such as from a value of one (1) to ten (10), where scale valuescorrespond to an accuracy of wake word detection. A confidence score maybe based on any scale and/or value. The user device may determine thatthe audio content includes one or more words, such as “hey device,” thatmatch a stored wake word (or wake phrase) “hey device.” The user devicemay assign a confidence score of ten (10) to the one or more wordsdetermined from the audio content. The confidence score of ten (10) mayindicate that the one or more words match (e.g., substantiallycorrespond, approximately 100 percent accuracy, etc.) the stored wakeword. One or more words that are similar to the stored wake word (orwake phrase) such as, “hey, devices are usually sold here,” may beassigned a confidence score of eight (8). The confidence score of eight(8) may indicate that the one or more words determined from the audiocontent are close (e.g., similar, a partial match, less than percentaccuracy, etc.) to the stored wake word (or wake phrase). One or morewords that are similar to the stored wake word (or wake phrase) such as,“hey, do you want tacos tonight,” may be assigned a confidence score oftwo (2). The confidence score of two (2) may indicate that the one ormore words are weakly related (e.g., somewhat similar, a partial match,less than percent accuracy, etc.) to the stored wake word. The userdevice may assign any confidence score indicative of the accuracy ofdetection of the wake word (e.g., determining one or more words thatmatch/correspond to a stored wake word, etc.).

At 203, the user device may determine whether the audio content (e.g.,speech, etc.) originates from an authorized user or an unauthorizeduser. Block 202 and block 203 may be performed in any order, includingin parallel. An authorized user may include a person registered tooperate the user device (e.g., based on stored user information, a userprofile, etc.), a person that has permission to use the user device, auser associated with the user device, and/or the like. An unauthorizeduser may include an unknown user, a device (e.g., a television, a radio,a computing device, etc.), a person not associated with an authorizeduser, and/or the like. The user device may determine whether the audiocontent originates from an authorized user or an unauthorized user bygenerating, based on the audio content, a voiceprint (e.g., one or moremeasurable characteristics of a human voice that uniquely identifies anindividual, etc.) and determining whether the voiceprint matches astored voiceprint.

To generate the voiceprint, the user device may determine audiocharacteristics of the audio content. Audio characteristics may beand/or include a frequency, a duration, a decibel level, an amplitude, atone, an inflection, an audio rate, an audio volume, and/or any suchcharacteristic associated with the audio content. The voiceprint may bea collection and/or combination of the audio characteristics. The userdevice may determine and store audio characteristics as a voiceprint.The user device may determine and store voiceprints when the user deviceis configured for a “learn” or “discovery” mode, during an initial setupand/or registration of the user device, based to repeated use of theuser device, combinations thereof, and the like.

Stored voiceprints may be associated with user profiles, such asprofiles of authorized users. The user device may compare the voiceprintdetermined from the audio content to a stored voiceprint associated withan authorized user. When a voiceprint matches and/or corresponds to astored voiceprint, the user device may determine that audio contentassociated with the voiceprint originates from an authorized user. Whena voiceprint does not match and/or correspond to a stored voiceprint,the user device may determine that the voiceprint may be associated withan unauthorized user.

The user device, based on whether the user device recognizes the audiocontent as originating from an authorized user or an unauthorized user,may modify a threshold (e.g., the wake word detection threshold, etc.).At 205, the threshold may be low and/or lowered if the audio contentoriginates an authorized user. At 206, the threshold may be high and/orraised if the audio content originates from an unauthorized user, suchas an unknown user, a device (e.g., a television, a radio, a computingdevice, etc.), and/or the like.

At 207, the user device may compare and/or apply the confidence scoreassociated with the audio content (e.g., speech, etc.), that isdetermined at 202, to the threshold (e.g., the wake word detectionthreshold, etc.). The user device may compare and/or apply theconfidence score to the threshold to determine whether to accept one ormore words included in the audio content as the wake word (e.g., thestored wake word, etc.) or not.

At 208, the user device may determine to accept the one or more wordsincluded with the audio content as the wake word (or wake phrase) if theconfidence score is equal to and/or satisfies the threshold. The userdevice may determine not to accept the one or more words as the wakeword (or wake phrase) if the confidence score does not satisfy thethreshold. The threshold may be a value, such as a threshold value ofsix (6). If the audio content is associated with a confidence score often (10), such as one or more words that match (e.g., substantiallycorrespond, approximately 100 percent accuracy, etc.) the stored wakeword, then the user device may or may not accept the one or more wordsas the wake word. If the audio content is associated with a confidencescore of eight (8), such as one or more words determined from the audiocontent are close (e.g., similar, a partial match, less than percentaccuracy, etc.) to the stored wake word (or wake phrase), then the userdevice may or may not accept the one or more words as the wake word. Ifthe audio content is associated with a confidence score of two (2), suchas are weakly related (e.g., somewhat similar, a partial match, lessthan percent accuracy, etc.) to the stored wake word, then the userdevice may or may not accept the one or more words as the wake word. Theuser device may or may not accept the one or more words as the wake wordbased on any correlation between a confidence score and the threshold(e.g., the wake word detection threshold, etc.). The user device, mayraise and lower the threshold and/or determine which confidence scorevalues do or do not satisfy the threshold to decrease or increasescrutiny applied to wake word detection.

At 209 if the threshold is satisfied the user device may interact withthe user.

At 210, if the threshold is not satisfied, the user device may notinteract with the unauthorized user, and/or may remain in an unawakenedstate (e.g., standby, hibernate, etc.). The high and/or raised thresholdmay cause the user device to not interact with the user, and/or remainin an unawakened state (e.g., standby, hibernate, etc.), even if theaudio content includes the wake word (or wake phrase).

FIG. 3 shows a flowchart of a method 300 for determining a wake word. At310, it may be determined that that audio content comprises one or morewords. A user device (e.g., voice assistant device, voice enableddevice, smart device, computing device, the user device 101, etc.) maydetermine that the audio content comprises the one or more words. Theuser device may have one or more microphones, or the like, thatdetect/receive the audio content. The user device may detect and/orreceive the audio content based on a user speaking in proximity to theuser device, a device (e.g., a television, a radio, a computing device,etc.) generating the audio content, and/or the like. The user device maydetermine the one or more words (e.g., a wake word, a wake phrase, etc.)by performing speech-to-text operations and/or applying one or morevoice recognition algorithms to the audio content to extract the one ormore words.

The one or more words may be a wake word or may be associated with awake word (or wake phrase), such as “hey device.” The one or more wordsmay be similar to the wake word (or wake phrase), such as, “Hey, devicesare usually sold here.” Words that are similar to the wake word (or wakephrase) may be synonyms of the wake word (or wake phrase), words thatshare a phonetic relationship with the wake word (or wake phrase), andthe like. The wake word (or wake phrase) may be any word and/or words.

At 320, the audio content may be determined to be associated with anauthorized user. The user device may determine that the audio content isassociated with an authorized user. An authorized user may include aperson registered to operate the user device (e.g., based on a userprofile, etc.), a person that has permission to use the user device, auser associated with the user device, and/or the like. To determine thatthe audio content is associated with an authorized user, the user devicemay determine a voiceprint based on the audio content and compare thedetermined voiceprint to one or more stored voiceprints (e.g., storedvoiceprints of authorized users). The user device may determine that thedetermined voiceprint is associated with an authorized user because thedetermined voiceprint corresponds (matches) to a stored voiceprint thatis associated with an authorized user and/or user profile. The userdevice may aggregate one or more voice characteristics associated withthe audio content, and the aggregated voice characteristics mayrepresent the voiceprint. Voice characteristics may be and/or include afrequency, a duration, a decibel level, an amplitude, a tone, aninflection, an audio rate, an audio volume, and/or any or suchcharacteristic associated with the audio content.

At 330, a threshold (e.g., wake word detection threshold, etc.) may belowered. The user device may lower the threshold. The user device maylower the threshold based on the determination that the audio content isassociated with an authorized user. The threshold may be a value, suchas a threshold value of six (6). The threshold may be lowered to alesser value, such as a threshold value of one (1), because the audiocontent is associated with the authorized user. A low threshold maycause the user device to interact with a user (e.g., detect and/orprocess user commands and/or operational instructions, etc.) if audiocontent includes a wake word (or wake phrase) (e.g., matches a storedwake word/phrase, etc.), or includes one or more words that are similarto the wake word (or wake phrase) (e.g., similar to a stored wakeword/phrase, etc.).

At 340, it may be determined that at least a portion of the one or morewords corresponds to a wake word (or wake phrase). The user device maydetermine that at least a portion of the one or more words correspondsto the wake word (or wake phrase) by assigning a confidence score to theone or more words that is indicative of a correlation to the wake word(or wake phrase). A confidence score may be based on a scale, such asfrom a value of one (1) to ten (10), where the scale values correspondto a degree of correlation between the one or more words and the wakeword (or wake phrase). A confidence score may be based on any scaleand/or value. The user device may assign a confidence score of ten (10)to the one or more words if they include words such as “hey device,”that match the wake word (or wake phrase) “hey device.” The user devicemay assign a confidence score of two (2) to the one or more words ifthey include words such as “hey, do you want tacos tonight,” because aportion of the one or more words (e.g., “hey, do . . . ) are weaklyrelated (e.g., somewhat similar, a partial match, less than percentaccuracy, etc.) to the wake word (or wake phrase) “hey device.” The userdevice may assign any confidence score to the one or more words that isindicative of the correlation to the wake word (or wake phrase).

The user device may compare and/or apply the confidence score associatedwith the one or more words to the threshold (e.g., wake word detectionthreshold, etc.) to determine and/or as indication that at least aportion of the one or more words corresponds to the wake word (or wakephrase). The user device may compare and/or apply the confidence scoreassociated with the one or more words to the threshold to determinewhether to accept at least the portion of the one or more words as thewake word (e.g., the stored wake word, etc.) or not (e.g., determine howmuch does the portion of the one or more words match the wakeword/phrase, etc.).

The user device, based on the lowered threshold, may determine that atleast the portion of the one or more words corresponds to the wake word(or wake phrase). The user device may determine to accept at least theportion of the one or more words as the wake word (or wake phrase) ifthe confidence score satisfies the threshold. The user device may acceptat least the portion of the one or more words “hey, do you want tacostonight,” that are assigned the confidence score of two (2), as the wakeword (or wake phrase) because the confidence score of two (2) is greaterthan the lowered threshold value of one (1) (e.g., the confidence scoresatisfies the threshold).

At 350, one or more operational commands may be executed. The userdevice may execute the one or more operational commands. The user devicemay execute the one or more operational commands based on thedetermination that at least the portion of the one or more wordscorresponds to the wake word (or wake phrase). The user device mayinteract with a user to detect and execute user commands and/oroperational instructions. The user commands and/or operationalinstructions may originate from the authorized user. The user commandsand/or operational instructions may be used to control and/or manageoperations associated with the user device and/or a device associatedwith the user device, such as a target device. Executing the one or moreoperational commands may include sending the one or more operationalcommands to a target device that executes the one or more operationalcommands.

FIG. 4 shows a flowchart of a method 400 for determining a wake word. At410, it may be determined that that audio content comprises one or morewords. A user device (e.g., voice assistant device, voice enableddevice, smart device, computing device, the user device 101, etc.) maydetermine that the audio content comprises the one or more words. Theuser device may have one or more microphones, or the like, thatdetect/receive the audio content. The user device may detect and/orreceive the audio content based on a user speaking in proximity to theuser device, a device (e.g., a television, a radio, a computing device,etc.) generating the audio content, and/or the like. The user device maydetermine the one or more words (e.g., a wake word, a wake phrase, etc.)by performing speech-to-text operations and/or applying one or morevoice recognition algorithms to the audio content to extract the one ormore words.

The one or more words may be or be associated with a wake word (or wakephrase), such as “hey device.” The one or more words may be similar tothe wake word (or wake phrase), such as, “Hey, devices are usually soldhere.” Words that are similar to the wake word (or wake phrase) may besynonyms of the wake word (or wake phrase), words that share a phoneticrelationship with the wake word (or wake phrase), and the like. The wakeword (or wake phrase) may be any word and/or words.

At 420, the audio content may be determined to be associated with anunauthorized user (e.g., a non-authorized user, a user not associatedwith an authorized user, a device, etc.). The user device may determinethat the audio content is associated with an unauthorized user. Anunauthorized user may include a person that is not registered to operatethe user device (e.g., based on a user profile, etc.), a person thatdoes not have permission to use the user device, a user that is notassociated with the user device, and/or the like, such as a guest in ahome where the user device may be located, a device (e.g., a television,a radio, a computing device, etc.) generating audio content, and/or thelike. To determine that the audio content is associated with anunauthorized user, the user device may determine a voiceprint based onthe audio content and compare the determined voiceprint to one or morestored voiceprints (e.g., stored voiceprints of authorized users.). Theuser device may determine that the determined voiceprint is notassociated with an authorized user because the determined voiceprintdoes not correspond (match) to a stored voiceprint that is associatedwith an authorized user and/or user profile. The user device mayaggregate one or more voice characteristics associated with the audiocontent, and the aggregated voice characteristics may represent thevoiceprint. Voice characteristics may be and/or include a frequency, aduration, a decibel level, an amplitude, a tone, an inflection, an audiorate, an audio volume, and/or any or such characteristic associated withthe audio content.

At 430, a threshold (e.g., wake word detection threshold, etc.) may beraised. The user device may raise the threshold (e.g., wake worddetection threshold, etc.). The user device may raise the wake wordthreshold based on the determination that the audio content isassociated with an unauthorized user. The threshold may be a value, suchas a threshold value of five (5). The threshold may be raised to ahigher value, such as a threshold value of seven (7), because the audiocontent is associated with the unauthorized user. A high (or raised)threshold (e.g., wake word detection threshold, etc.) may cause the userdevice to interact with an unauthorized user (e.g., detect and/orprocess user commands and/or operational instructions, etc.) only if theaudio content includes the wake word (or wake phrase) or a word/phrasethat substantially matches/corresponds to the wake word (or wakephrase).

At 440, it may be determined that at least a portion of the one or morewords corresponds to a wake word (or wake phrase). The user device maydetermine that at least a portion of the one or more words correspondsto the wake word (or wake phrase) by assigning a confidence score to theone or more words that is indicative of a correlation to the wake word(or wake phrase). A confidence score may be based on a scale, such asfrom a value of one (1) to ten (10), where the scale values correspondto a degree of correlation between the one or more words and the wakeword (or wake phrase). A confidence score may be based on any scaleand/or value. The user device may assign a confidence score of ten (10)to the one or more words if they include words such as “hey device,”that match the wake word (or wake phrase) “hey device.” The user devicemay assign a confidence score of eight (8) to the one or more words ifthey include words such as “hey, devices are usually sold here,” becausea portion of the one or more words (e.g., “hey devices) closely match(e.g., similar, a partial match, less than percent accuracy, etc.) thewake word (or wake phrase) “hey device.” The user device may assign anyconfidence score to the one or more words that is indicative of thecorrelation to the wake word (or wake phrase).

The user device may compare and/or apply the confidence score associatedwith the one or more words to the threshold (e.g., wake word detectionthreshold, etc.) to determine and/or as indication that at least aportion of the one or more words corresponds to the wake word (or wakephrase). The user device may compare and/or apply the confidence scoreassociated with the one or more words to the threshold to determinewhether to accept at least the portion of the one or more words as thewake word (e.g., the stored wake word, etc.) or not (e.g., determine howmuch does the portion of the one or more words match the wakeword/phrase, etc.).

The user device, based on the raised threshold, may be required todetermine that at least the portion of the one or more words match thewake word (or wake phrase). The user device may determine to accept atleast the portion of the one or more words as the wake word (or wakephrase) if the confidence score satisfies the threshold. The user devicemay accept at least the portion of the one or more words “hey device,”that are assigned the confidence score of ten (10), as the wake word (orwake phrase) because the confidence score of ten (10) is greater thanthe raised threshold value of seven (7) (e.g., the confidence scoresatisfies the threshold).

The user device may, based on the determining that the audio content isassociated with an unauthorized user (e.g., a non-authorized user, auser not associated with an authorized user, etc.), require that theaudio content include the wake word (or wake phrase) prior tointeracting with the unauthorized user. The user device may not interactwith the unauthorized user based on the determining that audio contentis associated with the unauthorized user. A high (or raised) thresholdmay cause the user device to not interact with the unauthorized usereven if the audio content includes the wake word (or wake phrase).

At 450, one or more operational commands may be executed. The userdevice may execute the one or more operational commands. The user devicemay execute the one or more operational commands based on thedetermination that at least the portion of the one or more wordscorresponds to the wake word (or wake phrase). The user device mayinteract with a user to detect and execute user commands and/oroperational instructions. The user commands and/or operationalinstructions may originate from the unauthorized user. The user commandsand/or operational instructions may be used to control and/or manageoperations associated with the user device and/or a device associatedwith the user device, such as a target device. Executing the one or moreoperational commands may include sending the one or more operationalcommands to a target device that executes the one or more operationalcommands.

FIG. 5 shows a flowchart of a method 500 for determining a wake word. At510, it may be determined that that audio content comprises one or morewords. A user device (e.g., voice assistant device, voice enableddevice, smart device, computing device, the user device 101, etc.) maydetermine that the audio content comprises the one or more words. Theuser device may have one or more microphones, or the like, thatdetect/receive the audio content. The user device may detect and/orreceive the audio content based on a user speaking in proximity to theuser device, a device (e.g., a television, a radio, a computing device,etc.) generating the audio content, and/or the like. The user device maydetermine the one or more words (e.g., a wake word, a wake phrase, etc.)by performing speech-to-text operations and/or applying one or morevoice recognition algorithms to the audio content to extract the one ormore words.

The one or more words may be a wake word or may be associated with awake word (or wake phrase), such as “hey device.” The one or more wordsmay be similar to the wake word (or wake phrase), such as, “Hey, devicesare usually sold here.” Words that are similar to the wake word (or wakephrase) may be synonyms of the wake word (or wake phrase), words thatshare a phonetic relationship with the wake word (or wake phrase), andthe like. The wake word (or wake phrase) may be any word and/or words.

At 520, one or more voice characteristics associated with the audiocontent may be determined. The one or more voice characteristicsassociated with the audio content may be indicative of whether the audiocontent is associated with an authorized user or an unauthorized user.An authorized user may include a person registered to operate the userdevice (e.g., based on a user profile, etc.), a person that haspermission to use the user device, a user associated with the userdevice, and/or the like. An unauthorized user may include a person thatis not registered to operate the user device (e.g., based on a userprofile, etc.), a person that does not have permission to use the userdevice, a user that is not associated with the user device and/or anauthorized user, and/or the like, such as a guest in a home where theuser device may be located, a device (e.g., a television, a radio, acomputing device, etc.) generating audio content, and/or the like.

At 530, a threshold (e.g., wake word detection threshold, etc.) may bedetermined. The user device may determine to use the threshold based onthe one or more voice characteristics associated with the audio content.The user device may user the one or more voice characteristicsassociated with the audio content to determine a voiceprint. The userdevice may aggregate the one or more voice characteristics associatedwith the audio content, and the aggregated voice characteristics mayrepresent the voiceprint. Voice characteristics may be and/or include afrequency, a duration, a decibel level, an amplitude, a tone, aninflection, an audio rate, an audio volume, and/or any or suchcharacteristic associated with the audio content.

The user device may compare the determined voiceprint to one or morestored voiceprints (e.g., stored voiceprints of authorized users). Theuser device may determine that the determined voiceprint is associatedwith an authorized user when the determined voiceprint corresponds(matches) to a stored voiceprint that is associated with an authorizeduser and/or user profile. The user device may determine that thedetermined voiceprint is not associated with an authorized user when thedetermined voiceprint does not correspond (match) to a stored voiceprintthat is associated with an authorized user and/or user profile.

The user device may lower the threshold. The user device may lower thethreshold based on the one or more voice characteristic indicating thatthe audio content is associated with an authorized user. The thresholdmay be a value, such as a threshold value of six (6). The threshold maybe lowered to a lesser value, such as a threshold value of one (1),because the audio content is associated with the authorized user. A lowthreshold may cause the user device to interact with a user (e.g.,detect and/or process user commands and/or operational instructions,etc.) if audio content includes a wake word (or wake phrase) (e.g.,matches a stored wake word/phrase, etc.), or includes one or more wordsthat are similar to the wake word (or wake phrase) (e.g., similar to astored wake word/phrase, etc.).

The user device may raise the threshold (e.g., wake word detectionthreshold, etc.). The user device may raise the wake word thresholdbased on the determination that the audio content is associated with anunauthorized user. The threshold may be a value, such as a thresholdvalue of five (5). The threshold may be raised to a higher value, suchas a threshold value of seven (7), because the audio content isassociated with the unauthorized user. A high (or raised) threshold(e.g., wake word detection threshold, etc.) may cause the user device tointeract with an unauthorized user (e.g., detect and/or process usercommands and/or operational instructions, etc.) only if the audiocontent includes the wake word (or wake phrase) or a word/phrase thatsubstantially matches/corresponds to the wake word (or wake phrase).

At 540, it may be determined that at least a portion of the one or morewords corresponds to a wake word (or wake phrase). The user device maydetermine that at least a portion of the one or more words correspondsto the wake word (or wake phrase) by assigning a confidence score to theone or more words that is indicative of a correlation to the wake word(or wake phrase). A confidence score may be based on a scale, such asfrom a value of one (1) to ten (10), where the scale values correspondto a degree of correlation between the one or more words and the wakeword (or wake phrase). A confidence score may be based on any scaleand/or value. The user device may assign a confidence score of ten (10)to the one or more words if they include words such as “hey device,”that match the wake word (or wake phrase) “hey device.” The user devicemay assign a confidence score of eight (8) to the one or more words ifthey include words such as “hey, devices are usually sold here,” becausea portion of the one or more words (e.g., “hey devices) closely match(e.g., similar, a partial match, less than percent accuracy, etc.) thewake word (or wake phrase) “hey device.” The user device may assign aconfidence score of two (2) to the one or more words if they includewords such as “hey, do you want tacos tonight,” because a portion of theone or more words (e.g., “hey, do . . . ) are weakly related (e.g.,somewhat similar, a partial match, less than percent accuracy, etc.) tothe wake word (or wake phrase) “hey device.” The user device may assignany confidence score to the one or more words that is indicative of thecorrelation to the wake word (or wake phrase).

The user device may compare and/or apply the confidence score associatedwith the one or more words to the threshold (e.g., wake word detectionthreshold, etc.) to determine and/or as indication that at least aportion of the one or more words corresponds to the wake word (or wakephrase). The user device may compare and/or apply the confidence scoreassociated with the one or more words to the threshold to determinewhether to accept at least the portion of the one or more words as thewake word (e.g., the stored wake word, etc.) or not (e.g., determine howmuch does the portion of the one or more words match the wakeword/phrase, etc.).

The user device, based on a lowered threshold, may determine that atleast the portion of the one or more words corresponds to the wake word(or wake phrase). The user device may determine to accept at least theportion of the one or more words as the wake word (or wake phrase) ifthe confidence score satisfies the threshold. The user device may acceptat least the portion of the one or more words “hey, do you want tacostonight,” that are assigned the confidence score of two (2), as the wakeword (or wake phrase) because the confidence score of two (2) is greaterthan the lowered threshold value of one (1) (e.g., the confidence scoresatisfies the threshold).

The user device, based on a raised threshold, may be required todetermine that at least the portion of the one or more words match thewake word (or wake phrase). The user device may determine to accept atleast the portion of the one or more words as the wake word (or wakephrase) if the confidence score satisfies the threshold. The user devicemay accept at least the portion of the one or more words “hey device,”that are assigned the confidence score of ten (10), as the wake word (orwake phrase) because the confidence score of ten (10) is greater thanthe raised threshold value of seven (7) (e.g., the confidence scoresatisfies the threshold).

The user device, based on the one or more voice characteristicindicating that that the audio content is associated with anunauthorized user (e.g., a non-authorized user, a user not associatedwith an authorized user, etc.), may require that the audio contentinclude the wake word (or wake phrase) prior to interacting with theunauthorized user. The user device may not interact with theunauthorized user based on the determining that audio content isassociated with the unauthorized user. A high (or raised) threshold maycause the user device to not interact with the unauthorized user even ifthe audio content includes the wake word (or wake phrase).

One or more operational commands may be executed. The user device mayexecute the one or more operational commands. The user device mayexecute the one or more operational commands based on the determinationthat at least the portion of the one or more words corresponds to thewake word (or wake phrase). The user device may interact with a user todetect and execute user commands and/or operational instructions. Theuser commands and/or operational instructions may originate from anauthorized user or an unauthorized user. The user commands and/oroperational instructions may be used to control and/or manage operationsassociated with the user device and/or a device associated with the userdevice, such as a target device. Executing the one or more operationalcommands may include sending the one or more operational commands to atarget device that executes the one or more operational commands.

FIG. 6 shows a system 600 for determining a wake word. The user device101, and the computing device 106 of FIG. 1 may be a computer 601 asshown in FIG. 6 . The computer 601 may comprise one or more processors603, a system memory 612, and a bus 613 that couples various componentsof the computer 601 including the one or more processors 603 to thesystem memory 612. In the case of multiple processors 603, the computer601 may utilize parallel computing.

The bus 613 may comprise one or more of several possible types of busstructures, such as a memory bus, memory controller, a peripheral bus,an accelerated graphics port, and a processor or local bus using any ofa variety of bus architectures.

The computer 601 may operate on and/or comprise a variety of computerreadable media (e.g., non-transitory). Computer readable media may beany available media that is accessible by the computer 601 andcomprises, non-transitory, volatile and/or non-volatile media, removableand non-removable media. The system memory 612 has computer readablemedia in the form of volatile memory, such as random access memory(RAM), and/or non-volatile memory, such as read only memory (ROM). Thesystem memory 612 may store data such as user and wake word data 607and/or program modules such as operating system 605 and wake worddetection software 606 that are accessible to and/or are operated on bythe one or more processors 603.

The computer 601 may also comprise other removable/non-removable,volatile/non-volatile computer storage media. The mass storage device604 may provide non-volatile storage of computer code, computer readableinstructions, data structures, program modules, and other data for thecomputer 601. The mass storage device 604 may be a hard disk, aremovable magnetic disk, a removable optical disk, magnetic cassettes orother magnetic storage devices, flash memory cards, CD-ROM, digitalversatile disks (DVD) or other optical storage, random access memories(RAM), read only memories (ROM), electrically erasable programmableread-only memory (EEPROM), and the like.

Any number of program modules may be stored on the mass storage device604. An operating system 605 and wake word detection software 606 may bestored on the mass storage device 604. One or more of the operatingsystem 605 and wake word detection software 606 (or some combinationthereof) may comprise program modules and the user and wake wordsoftware 606. User and wake word data 607 may also be stored on the massstorage device 604. User and wake word data 607 may be stored in any ofone or more databases known in the art. The databases may be centralizedor distributed across multiple locations within the network 615.

A user may enter commands and information into the computer 601 via aninput device (not shown). Such input devices comprise, but are notlimited to, a keyboard, pointing device (e.g., a computer mouse, remotecontrol), a microphone, a joystick, a scanner, tactile input devicessuch as gloves, and other body coverings, motion sensor, and the likeThese and other input devices may be connected to the one or moreprocessors 603 via a human machine interface 602 that is coupled to thebus 613, but may be connected by other interface and bus structures,such as a parallel port, game port, an IEEE 1394 Port (also known as aFirewire port), a serial port, network adapter 608, and/or a universalserial bus (USB).

A display device 611 may also be connected to the bus 613 via aninterface, such as a display adapter 609. It is contemplated that thecomputer 601 may have more than one display adapter 609 and the computer601 may have more than one display device 611. A display device 611 maybe a monitor, an LCD (Liquid Crystal Display), light emitting diode(LED) display, television, smart lens, smart glass, and/or a projector.In addition to the display device 611, other output peripheral devicesmay comprise components such as speakers (not shown) and a printer (notshown) which may be connected to the computer 601 via Input/OutputInterface 610. Any step and/or result of the methods may be output (orcaused to be output) in any form to an output device. Such output may beany form of visual representation, including, but not limited to,textual, graphical, animation, audio, tactile, and the like. The display611 and computer 601 may be part of one device, or separate devices.

The computer 601 may operate in a networked environment using logicalconnections to one or more remote computing devices 614 a,b,c. A remotecomputing device 614 a,b,c may be a personal computer, computing station(e.g., workstation), portable computer (e.g., laptop, mobile phone,tablet device), smart device (e.g., smartphone, smart watch, activitytracker, smart apparel, smart accessory), security and/or monitoringdevice, a server, a router, a network computer, a peer device, edgedevice or other common network node, and so on. Logical connectionsbetween the computer 601 and a remote computing device 614 a,b,c may bemade via a network 615, such as a local area network (LAN) and/or ageneral wide area network (WAN). Such network connections may be througha network adapter 608. A network adapter 608 may be implemented in bothwired and wireless environments. Such networking environments areconventional and commonplace in dwellings, offices, enterprise-widecomputer networks, intranets, and the Internet.

Application programs and other executable program components such as theoperating system 605 are shown herein as discrete blocks, although it isrecognized that such programs and components may reside at various timesin different storage components of the computing device 601, and areexecuted by the one or more processors 603 of the computer 601. Animplementation of wake word detection software 606 may be stored on orsent across some form of computer readable media. Any of the disclosedmethods may be performed by processor-executable instructions embodiedon computer readable media.

While specific configurations have been described, it is not intendedthat the scope be limited to the particular configurations set forth, asthe configurations herein are intended in all respects to be possibleconfigurations rather than restrictive.

Unless otherwise expressly stated, it is in no way intended that anymethod set forth herein be construed as requiring that its steps beperformed in a specific order. Accordingly, where a method claim doesnot actually recite an order to be followed by its steps or it is nototherwise specifically stated in the claims or descriptions that thesteps are to be limited to a specific order, it is no way intended thatan order be inferred, in any respect. This holds for any possiblenon-express basis for interpretation, including: matters of logic withrespect to arrangement of steps or operational flow; plain meaningderived from grammatical organization or punctuation; the number or typeof configurations described in the specification.

It will be apparent to those skilled in the art that variousmodifications and variations may be made without departing from thescope or spirit. Other configurations will be apparent to those skilledin the art from consideration of the specification and practicedescribed herein. It is intended that the specification and describedconfigurations be considered as exemplary only, with a true scope andspirit being indicated by the following claims.

What is claimed is:
 1. A method comprising: receiving, by a computingdevice, audio content; based on one or more voice characteristicsassociated with the audio content that indicate that the audio contentis associated with an authorized user, determining to lower a wake wordthreshold for processing the audio content; and based on adetermination, using the lowered wake word threshold, that at least aportion of the audio content corresponds to a wake word or phrase,causing execution of one or more operational commands associated withthe audio content.
 2. The method of claim 1, wherein the one or morevoice characteristics comprises one or more of: a frequency, a decibellevel, or a tone.
 3. The method of claim 1, wherein the determinationthat at least the portion of the audio content corresponds to the wakeword or phrase comprises: determining, based on the audio content, oneor more words in the at least the portion of the audio content satisfythe lowered wake word threshold.
 4. The method of claim 1, wherein thelowered wake word threshold is associated with a lower confidence levelrequirement that the audio content comprises the wake word or phrase. 5.The method of claim 1, wherein the lowered wake word threshold isassociated with one or more authorized users comprising the authorizeduser and a higher wake word threshold is associated with an origin ofthe audio content that is not associated with the one or more authorizedusers.
 6. The method of claim 1, wherein the one or more operationalcommands are associated with a target device and wherein causingexecution of the one or more operational commands comprises sending, tothe target device, the one or more operational commands.
 7. The methodof claim 1, further comprising: receiving second audio content; andbased on one or more second voice characteristics associated with thesecond audio content that indicate that the second audio content is notassociated with one or more authorized users comprising the authorizeduser, increasing the wake word threshold, from the lowered wake wordthreshold, for processing the second audio content.
 8. A methodcomprising: receiving, by a computing device, audio content; based onone or more voice characteristics associated with the audio content thatindicate that the audio content is not associated with one or moreauthorized users, determining to increase a wake word threshold forprocessing the audio content; and based on a determination, using theincreased wake word threshold, that at least a portion of the audiocontent corresponds to a wake word or phrase, causing execution of oneor more operational commands associated with the audio content.
 9. Themethod of claim 8, wherein the one or more voice characteristicscomprises one or more of: a frequency, a decibel level, or a tone. 10.The method of claim 8, wherein the increased wake word threshold isassociated with a higher confidence level requirement that the audiocontent comprises the wake word or phrase.
 11. The method of claim 8,wherein the determination that at least the portion of the audio contentcorresponds to the wake word or phrase comprises: determining, based onthe audio content, one or more words in the at least the portion of theaudio content satisfy the increased wake word threshold.
 12. The methodof claim 8, wherein the increased wake word threshold is greater than alower wake word threshold associated with the one or more authorizedusers.
 13. The method of claim 8, further comprising: receiving secondaudio content; and based on one or more second voice characteristicsassociated with the second audio content that indicate that the secondaudio content is associated with one or more of the one or moreauthorized users, lowering the wake word threshold, from the increasedwake word threshold, for processing the second audio content.
 14. Themethod of claim 8, wherein the one or more operational commands areassociated with a target device and wherein causing execution of the oneor more operational commands comprises sending, to the target device,the one or more operational commands.
 15. One or more non-transitorycomputer-readable media storing processor-executable instructions that,when executed by at least one processor, cause the at least oneprocessor to: receive audio content; based on one or more voicecharacteristics associated with the audio content that indicate that theaudio content is associated with an authorized user, determine to lowera wake word threshold for processing the audio content; and based on adetermination, using the lowered wake word threshold, that at least aportion of the audio content corresponds to a wake word or phrase, causeexecution of one or more operational commands associated with the audiocontent.
 16. The one or more non-transitory computer-readable media ofclaim 15, wherein the one or more voice characteristics comprises one ormore of: a frequency, a decibel level, or a tone.
 17. The one or morenon-transitory computer-readable media of claim 15, wherein theprocessor-executable instructions that, when executed by the at leastone processor, cause the at least one processor to determine that atleast the portion of the audio content corresponds to the wake word orphrase, cause the at least one processor to determine, based on theaudio content, one or more words in the at least the portion of theaudio content satisfy the lowered wake word threshold.
 18. The one ormore non-transitory computer-readable media of claim 15, wherein thelowered wake word threshold is associated with a lower confidence levelrequirement that the audio content comprises the wake word or phrase.19. The one or more non-transitory computer-readable media of claim 15,wherein the lowered wake word threshold is associated with one or moreauthorized users comprising the authorized user and a higher wake wordthreshold is associated with an origin of the audio content that is notassociated with the one or more authorized users.
 20. The one or morenon-transitory computer-readable media of claim 15, wherein the one ormore operational commands are associated with a target device andwherein the processor-executable instructions that, when executed by theat least one processor, cause the at least one processor to causeexecution of the one or more operational commands, cause the at leastone processor to send, to the target device, the one or more operationalcommands.
 21. The one or more non-transitory computer-readable media ofclaim 15, wherein the processor-executable instructions, when executedby the at least one processor, further cause the at least one processorto: receive second audio content; and based on one or more second voicecharacteristics associated with the second audio content that indicatethat the second audio content is not associated with one or moreauthorized users comprising the authorized user, increase the wake wordthreshold, from the lowered wake word threshold, for processing thesecond audio content.
 22. One or more non-transitory computer-readablemedia storing processor-executable instructions that, when executed byat least one processor, cause the at least one processor to: receiveaudio content; based on one or more voice characteristics associatedwith the audio content that indicate that the audio content is notassociated with one or more authorized users, determine to increase awake word threshold for processing the audio content; and based on adetermination, using the increased wake word threshold, that at least aportion of the audio content corresponds to a wake word or phrase, causeexecution of one or more operational commands associated with the audiocontent.
 23. The one or more non-transitory computer-readable media ofclaim 22, wherein the one or more voice characteristics comprises one ormore of: a frequency, a decibel level, or a tone.
 24. The one or morenon-transitory computer-readable media of claim 22, wherein theincreased wake word threshold is associated with a higher confidencelevel requirement that the audio content comprises the wake word orphrase.
 25. The one or more non-transitory computer-readable media ofclaim 22, wherein the processor-executable instructions that, whenexecuted by the at least one processor, cause the at least one processorto determine that at least the portion of the audio content correspondsto the wake word or phrase, cause the at least one processor todetermine, based on the audio content, one or more words in the at leastthe portion of the audio content satisfy the increased wake wordthreshold.
 26. The one or more non-transitory computer-readable media ofclaim 22, wherein the increased wake word threshold is greater than alower wake word threshold associated with the one or more authorizedusers.
 27. The one or more non-transitory computer-readable media ofclaim 22, wherein the processor-executable instructions, when executedby the at least one processor, further cause the at least one processorto: receive second audio content; and based on one or more second voicecharacteristics associated with the second audio content that indicatethat the second audio content is associated with one or more of the oneor more authorized users, lower the wake word threshold, from theincreased wake word threshold, for processing the second audio content.28. The one or more non-transitory computer-readable media of claim 22,wherein the one or more operational commands are associated with atarget device and wherein the processor-executable instructions that,when executed by the at least one processor, cause the at least oneprocessor to cause execution of the one or more operational commands,cause the at least one processor to send, to the target device, the oneor more operational commands.
 29. An apparatus comprising: one or moreprocessors; and memory storing processor-executable instructions that,when executed by the one or more processors, cause the apparatus to:receive audio content; based on one or more voice characteristicsassociated with the audio content that indicate that the audio contentis associated with an authorized user, determine to lower a wake wordthreshold for processing the audio content; and based on adetermination, using the lowered wake word threshold, that at least aportion of the audio content corresponds to a wake word or phrase, causeexecution of one or more operational commands associated with the audiocontent.
 30. The apparatus of claim 29, wherein the one or more voicecharacteristics comprises one or more of: a frequency, a decibel level,or a tone.
 31. The apparatus of claim 29, wherein theprocessor-executable instructions that, when executed by the one or moreprocessors, cause the apparatus to determine that at least the portionof the audio content corresponds to the wake word or phrase, cause theapparatus to determine, based on the audio content, one or more words inthe at least the portion of the audio content satisfy the lowered wakeword threshold.
 32. The apparatus of claim 29, wherein the lowered wakeword threshold is associated with a lower confidence level requirementthat the audio content comprises the wake word or phrase.
 33. Theapparatus of claim 29, wherein the lowered wake word threshold isassociated with one or more authorized users comprising the authorizeduser and a higher wake word threshold is associated with an origin ofthe audio content that is not associated with the one or more authorizedusers.
 34. The apparatus of claim 29, wherein the one or moreoperational commands are associated with a target device and wherein theprocessor-executable instructions that, when executed by the one or moreprocessors, cause the apparatus to cause execution of the one or moreoperational commands, cause the apparatus to send, to the target device,the one or more operational commands.
 35. The apparatus of claim 29,wherein the processor-executable instructions, when executed by the oneor more processors, further cause the apparatus to: receive second audiocontent; and based on one or more second voice characteristicsassociated with the second audio content that indicate that the secondaudio content is not associated with one or more authorized userscomprising the authorized user, increase the wake word threshold, fromthe lowered wake word threshold, for processing the second audiocontent.
 36. An apparatus comprising: one or more processors; and memorystoring processor-executable instructions that, when executed by the oneor more processors, cause the apparatus to: receive audio content; basedon one or more voice characteristics associated with the audio contentthat indicate that the audio content is not associated with one or moreauthorized users, determine to increase a wake word threshold forprocessing the audio content; and based on a determination, using theincreased wake word threshold, that at least a portion of the audiocontent corresponds to a wake word or phrase, cause execution of one ormore operational commands associated with the audio content.
 37. Theapparatus of claim 36, wherein the one or more voice characteristicscomprises one or more of: a frequency, a decibel level, or a tone. 38.The apparatus of claim 36, wherein the increased wake word threshold isassociated with a higher confidence level requirement that the audiocontent comprises the wake word or phrase.
 39. The apparatus of claim36, wherein the processor-executable instructions that, when executed bythe one or more processors, cause the apparatus to determine that atleast the portion of the audio content corresponds to the wake word orphrase, cause the apparatus to determine, based on the audio content,one or more words in the at least the portion of the audio contentsatisfy the increased wake word threshold.
 40. The apparatus of claim36, wherein the increased wake word threshold is greater than a lowerwake word threshold associated with the one or more authorized users.41. The apparatus of claim 36, wherein the processor-executableinstructions, when executed by the one or more processors, further causethe apparatus to: receive second audio content; and based on one or moresecond voice characteristics associated with the second audio contentthat indicate that the second audio content is associated with one ormore of the one or more authorized users, lower the wake word threshold,from the increased wake word threshold, for processing the second audiocontent.
 42. The apparatus of claim 36, wherein the one or moreoperational commands are associated with a target device and wherein theprocessor-executable instructions that, when executed by the one or moreprocessors, cause the apparatus to cause execution of the one or moreoperational commands, cause the apparatus to send, to the target device,the one or more operational commands.